Download Multiple-F0 tracking based on a high-order HMM model
This paper is about multiple-F0 tracking and the estimation of the number of harmonic source streams in music sound signals. A source stream is understood as generated from a note played by a musical instrument. A note is described by a hidden Markov model (HMM) having two states: the attack state and the sustain state. It is proposed to first perform the tracking of F0 candidates using a high-order hidden Markov model, based on a forward-backward dynamic programming scheme. The propagated weights are calculated in the forward tracking stage, followed by an iterative tracking of the most likely trajectories in the backward tracking stage. Then, the estimation of the underlying source streams is carried out by means of iteratively pruning the candidate trajectories in a maximum likelihood manner. The proposed system is evaluated by a specially constructed polyphonic music database. Compared with the frame-based estimation systems, the tracking mechanism improves significantly the accuracy rate.
Download On Stretching Gaussian Noises with the Phase Vocoder
Recently, the processing of non-sinusoidal signals, or sound textures, has become an important topic in various areas. In general, the transformation is done by the phase vocoder techniques. Since the phase vocoder technique is based on a sinusoidal model, it’s performance is not satisfying when applied to transform sound textures. The following article investigates into the problem using as example the most basic non-sinusoidal sounds, that are noise signals. We demonstrate the problems that arise when time stretching noise with the phase vocoder, provide a description of some relevant statistical properties of the time frequency representation of noise and introduce an algorithm that allows to preserve these statistical properties when time stretching noise with the phase vocoder. The resulting algorithm significantly improves the perceptual quality of the time stretched noise signals and therefore it is seen as a promising first step towards an algorithm for transformation of sound textures.
Download Source Filter Model For Expressive Gu-Qin Synthesis and its iOS App
Gu-Qin as a venerable Chinese plucked-string instrument has its unique performance techniques and enchanting sounds. It is on the UNESCO Representative List of the Intangible Cultural Heritage of Humanity. It is one of the oldest Chinese solo instruments. The variation of Gu-Qin sound is so large that carefullydesigned controls of its computer synthesizer are necessary. We developed a parametric source-filter model for re-synthesizing expressive Gu-Qin notes. It is capable to cover as many as possible combinations of Gu-Qin’s performance techniques. In this paper, a brief discussion of Gu-Qin playing and its special tablature notation are made for understanding the relationship between its performance techniques and its sounds. This work includes a Gu-Qin’s musical notation system and a source-filter model based synthesizer. In addition, we implement an iOS app to demonstrate its low computation complexity and robustness. It is easy to perform improvisation of the sounds because of its friendly user interfaces.
Download On the Modeling of Sound Textures Based on the STFT Representation
Sound textures are often noisy and chaotic. The processing of these sounds must be based on the statistics of its corresponding time-frequency representation. In order to transform sound textures with existing mechanisms, a statistical model based on the STFT representation is favored. In this article, the relation between statistics of a sound texture and its time-frequency representation is explored. We proposed an algorithm to extract and modify the statistical properties of a sound texture based on its STFT representation. It allows us to extract the statistical model of a sound texture and resynthesise the sound texture after modifications have been made. It could also be used to generate new samples of the sound texture from a given sample. The results of the experiment show that the algorithm is capable of generating high quality sounds from an extracted model. This result could serve as a basis for transformations like morphing or high-level control of sound textures.
Download TELTPC Based Re-Synthesis Method for Isolated Notes of Polyphonic Instrumental Music Recordings
In this paper, we presented a flexible analysis/re-synthesis method for smoothly changing the properties of isolated notes in polyphonic instrumental music recordings. True Envelope Linear Predictive Coding (TELPC) method has been employed as the analysis/synthesis model in order to preserve the original timbre quality as much as possible due to its accurate spectral envelope estimation. We modified the conventional LPC analysis/synthesis processing by using pitch synchronous analysis frames to avoid the severe magnitude modulation problem. Smaller frames can thus be used to capture more local characteristics of the original signals to further improve the sound quality. In this framework, one can manipulate a sequence of isolated notes from two commercially available polyphonic instrumental music recordings and interesting re-synthesized results are achieved.
Download Timbre-Constrained Recursive Time-Varying Analysis for Musical Note Separation
Note separation in music signal processing becomes difficult when there are overlapping partials from co-existing notes produced by either the same or different musical instruments. In order to deal with this problem, it is necessary to involve certain invariant features of musical instrument sounds into the separation processing. For example, the timbre of a note of a musical instrument may be used as one possible invariant feature. In this paper, a timbre estimate is used to represent this feature such that it becomes a constraint when note separation is performed on a mixture signal. To demonstrate the proposed method, a timedependent recursive regularization analysis is employed. Spectral envelopes of different notes are estimated and a modified parameter update strategy is applied to the recursive regularization process. The experiment results show that the flaws due to the overlapping partial problem can be effectively reduced through the proposed approach.
Download Automatic Violin Synthesis Using Expressive Musical Term Features
The control of interpretational properties such as duration, vibrato, and dynamics is important in music performance. Musicians continuously manipulate such properties to achieve different expressive intentions. This paper presents a synthesis system that automatically converts a mechanical, deadpan interpretation to distinct expressions by controlling these expressive factors. Extending from a prior work on expressive musical term analysis, we derive a subset of essential features as the control parameters, such as the relative time position of the energy peak in a note and the mean temporal length of the notes. An algorithm is proposed to manipulate the energy contour (i.e. for dynamics) of a note. The intended expressions of the synthesized sounds are evaluated in terms of the ability of the machine model developed in the prior work. Ten musical expressions such as Risoluto and Maestoso are considered, and the evaluation is done using held-out music pieces. Our evaluations show that it is easier for the machine to recognize the expressions of the synthetic version, comparing to those of the real recordings of an amateur student. While a listening test is under construction as a next step for further performance validation, this work represents to our best knowledge a first attempt to build and quantitatively evaluate a system for EMT analysis/synthesis.
Download Analysis and Synthesis of the Violin Playing Style of Heifetz and Oistrakh
The same music composition can be performed in different ways, and the differences in performance aspects can strongly change the expression and character of the music. Experienced musicians tend to have their own performance style, which reflects their personality, attitudes and beliefs. In this paper, we present a datadriven analysis of the performance style of two master violinists, Jascha Heifetz and David Fyodorovich Oistrakh to find out their differences. Specifically, from 26 gramophone recordings of each of these two violinists, we compute features characterizing performance aspects including articulation, energy, and vibrato, and then compare their style in terms of the accents and legato groups of the music. Based on our findings, we propose algorithms to synthesize violin audio solo recordings of these two masters from scores, for music compositions that we either have or have not observed in the analysis stage. To our best knowledge, this study represents the first attempt that computationally analyzes and synthesizes the playing style of master violinists.